Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
نویسندگان
چکیده
In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task. Real life sound recordings typically have many overlapping sound events, making it hard to recognize with just mono channel audio. Human listeners have been successfully recognizing the mixture of overlapping sound events using pitch cues and exploiting the stereo (multichannel) audio signal available at their ears to spatially localize these events. Traditionally SED systems have only been using mono channel audio, motivated by the human listener we propose to extend them to use multichannel audio. The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database [1]. The usage of spatial and harmonic features are shown to improve the performance of SED.
منابع مشابه
Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features
In this paper, we propose a stacked convolutional and recurrent neural network (CRNN) with a 3D convolutional neural network (CNN) in the first layer for the multichannel sound event detection (SED) task. The 3D CNN enables the network to simultaneously learn the interand intra-channel features from the input multichannel audio. In order to evaluate the proposed method, multichannel audio datas...
متن کاملSpatial audio and sensory evaluation techniques – context, history and aims
Spatial sound reproduction gives rise to new challenges for those trying to evaluate sensory features contributing to perceived quality. Recent technical developments have enabled the delivery of sophisticated multichannel audio signals to consumers, over links that range very widely in quality, requiring decisions to be made about the tradeoffs between different aspects of audio quality. Spati...
متن کاملComparison of Quality Degradation Effects Caused by Limitation of Bandwidth and by Down-mix Algorithms in Consumer Multichannel Audio Delivery Systems
The comparative effect on audio quality of controlled multichannel audio bandwidth limitation and selected downmix algorithms was examined. The investigation was focused on the standard 5.1 multichannel audio set-up (Rec. ITU-R BS.775-1) and was limited to the optimum listening position. The obtained results indicate that in case of multichannel audio systems spatial quality is less important t...
متن کاملParametric Coding of Stereo Audio Based on Principal Component Analysis
Low bit rate parametric coding of multichannel audio is mainly based on Binaural Cue Coding (BCC). Another multichannel audio processing method called upmix can also be used to deliver multichannel audio, typically 5.1 signals, at low data rates. More precisely, we focus on existing upmix method based on Principal Component Analysis (PCA). This PCA-based upmix method aims at blindly create a re...
متن کاملPsychoacoustic-based quantisation of spatial audio cues
The derivation of spatial cues representing source localisation information is a typical component of multichannel spatial audio coders. Efficient compression of spatial cues based on psychoacoustic localisation features is investigated. Results show that the proposed quantisation approach for spatial cue compression achieves bit-rates of less than 6 kbit/s while preserving critical source loca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.02293 شماره
صفحات -
تاریخ انتشار 2016